NVIDIA Unveils Nemotron ASR for Low-Latency Applications
Explore NVIDIA's new Nemotron Speech ASR model designed for voice agents and live captioning with low-latency performance.
Records found: 10
Explore NVIDIA's new Nemotron Speech ASR model designed for voice agents and live captioning with low-latency performance.
'Microsoft released VibeVoice 1.5B, an open source TTS model that generates up to 90 minutes of expressive audio with up to four speakers and supports cross lingual and singing synthesis.'
'NuMind launched NuMarkdown-8B-Thinking, a reasoning-first OCR VLM that infers layout and outputs clean Markdown ideal for RAG and document archiving.'
Google has released the open-source MCP Toolbox for Databases to simplify and secure how AI agents interact with SQL databases, enabling efficient, scalable, and safe querying with minimal configuration.
Baidu releases ERNIE 4.5, a series of open-source large language models scaling from 0.3 billion to 424 billion parameters, featuring advanced architectures and strong multilingual capabilities.
Tencent introduces Hunyuan-A13B, a highly efficient open-source MoE language model with dual-mode reasoning and support for ultra-long 256K context lengths, achieving state-of-the-art benchmark results.
OpenAI has open-sourced a multi-agent customer service demo showcasing how to build specialized AI agents using the Agents SDK, featuring safety guardrails and a transparent conversational interface.
Rime has introduced Arcana and Rimecaster, open source voice AI models trained on natural conversational speech to enhance realism and flexibility in voice applications.
Hugging Face has released nanoVLM, a compact PyTorch library that enables training a vision-language model from scratch in just 750 lines of code, combining efficiency, transparency, and strong performance.
NVIDIA has released Parakeet TDT 0.6B, an open-source ASR model that transcribes an hour of audio in just one second while achieving top accuracy benchmarks, setting a new industry standard.